Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster split, chunk and unbind #563

Merged
merged 3 commits into from
Nov 21, 2023
Merged

[Performance] Faster split, chunk and unbind #563

merged 3 commits into from
Nov 21, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 21, 2023

We pre-compute the batch-size to accelerate split, chunk and unbind.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 21, 2023
@vmoens vmoens marked this pull request as ready for review November 21, 2023 10:34
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 117. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 33.6610μs 12.5175μs 79.8879 KOps/s 79.4391 KOps/s $\color{#35bf28}+0.57\%$
test_plain_set_stack_nested 0.1403ms 0.1167ms 8.5675 KOps/s 8.5356 KOps/s $\color{#35bf28}+0.37\%$
test_plain_set_nested_inplace 33.1110μs 14.8945μs 67.1389 KOps/s 66.5346 KOps/s $\color{#35bf28}+0.91\%$
test_plain_set_stack_nested_inplace 0.1731ms 0.1397ms 7.1575 KOps/s 7.0585 KOps/s $\color{#35bf28}+1.40\%$
test_items 23.4510μs 4.6897μs 213.2336 KOps/s 212.0739 KOps/s $\color{#35bf28}+0.55\%$
test_items_nested 0.4380ms 0.3380ms 2.9587 KOps/s 2.9765 KOps/s $\color{#d91a1a}-0.60\%$
test_items_nested_locked 0.3700ms 0.3370ms 2.9674 KOps/s 2.9792 KOps/s $\color{#d91a1a}-0.40\%$
test_items_nested_leaf 0.2251ms 0.1993ms 5.0173 KOps/s 5.0583 KOps/s $\color{#d91a1a}-0.81\%$
test_items_stack_nested 1.5176ms 1.4409ms 694.0186 Ops/s 681.2262 Ops/s $\color{#35bf28}+1.88\%$
test_items_stack_nested_leaf 1.3324ms 1.2764ms 783.4259 Ops/s 776.1847 Ops/s $\color{#35bf28}+0.93\%$
test_items_stack_nested_locked 0.8573ms 0.7990ms 1.2515 KOps/s 1.2144 KOps/s $\color{#35bf28}+3.06\%$
test_keys 22.1900μs 4.7832μs 209.0662 KOps/s 218.8575 KOps/s $\color{#d91a1a}-4.47\%$
test_keys_nested 2.1324ms 91.2871μs 10.9545 KOps/s 10.9924 KOps/s $\color{#d91a1a}-0.35\%$
test_keys_nested_locked 0.1518ms 90.5441μs 11.0443 KOps/s 11.1087 KOps/s $\color{#d91a1a}-0.58\%$
test_keys_nested_leaf 43.1521ms 86.6093μs 11.5461 KOps/s 12.2827 KOps/s $\textbf{\color{#d91a1a}-6.00\%}$
test_keys_stack_nested 1.3257ms 1.2675ms 788.9477 Ops/s 789.8717 Ops/s $\color{#d91a1a}-0.12\%$
test_keys_stack_nested_leaf 1.3055ms 1.2498ms 800.1033 Ops/s 791.3357 Ops/s $\color{#35bf28}+1.11\%$
test_keys_stack_nested_locked 0.6658ms 0.6140ms 1.6288 KOps/s 1.6201 KOps/s $\color{#35bf28}+0.54\%$
test_values 10.0000μs 1.8812μs 531.5763 KOps/s 528.9723 KOps/s $\color{#35bf28}+0.49\%$
test_values_nested 67.4410μs 42.9906μs 23.2609 KOps/s 23.1716 KOps/s $\color{#35bf28}+0.39\%$
test_values_nested_locked 66.8710μs 42.8514μs 23.3364 KOps/s 23.1949 KOps/s $\color{#35bf28}+0.61\%$
test_values_nested_leaf 58.8610μs 37.1733μs 26.9011 KOps/s 26.6444 KOps/s $\color{#35bf28}+0.96\%$
test_values_stack_nested 1.1667ms 1.0963ms 912.1971 Ops/s 894.0568 Ops/s $\color{#35bf28}+2.03\%$
test_values_stack_nested_leaf 1.3514ms 1.0866ms 920.3333 Ops/s 909.1831 Ops/s $\color{#35bf28}+1.23\%$
test_values_stack_nested_locked 0.5327ms 0.4769ms 2.0969 KOps/s 2.0620 KOps/s $\color{#35bf28}+1.69\%$
test_membership 3.9383μs 0.9223μs 1.0843 MOps/s 1.0708 MOps/s $\color{#35bf28}+1.26\%$
test_membership_nested 36.1910μs 2.2041μs 453.7007 KOps/s 446.3732 KOps/s $\color{#35bf28}+1.64\%$
test_membership_nested_leaf 33.0255μs 2.0867μs 479.2270 KOps/s 463.7395 KOps/s $\color{#35bf28}+3.34\%$
test_membership_stacked_nested 37.2710μs 10.6214μs 94.1500 KOps/s 92.7750 KOps/s $\color{#35bf28}+1.48\%$
test_membership_stacked_nested_leaf 43.7110μs 10.9539μs 91.2915 KOps/s 93.1035 KOps/s $\color{#d91a1a}-1.95\%$
test_membership_nested_last 19.4710μs 4.6215μs 216.3823 KOps/s 216.9041 KOps/s $\color{#d91a1a}-0.24\%$
test_membership_nested_leaf_last 36.6310μs 4.6497μs 215.0670 KOps/s 218.4416 KOps/s $\color{#d91a1a}-1.54\%$
test_membership_stacked_nested_last 0.1796ms 0.1337ms 7.4778 KOps/s 7.4937 KOps/s $\color{#d91a1a}-0.21\%$
test_membership_stacked_nested_leaf_last 72.8920μs 12.6294μs 79.1804 KOps/s 78.3065 KOps/s $\color{#35bf28}+1.12\%$
test_nested_getleaf 35.8410μs 8.4265μs 118.6731 KOps/s 119.5739 KOps/s $\color{#d91a1a}-0.75\%$
test_nested_get 30.2700μs 7.9268μs 126.1546 KOps/s 126.0298 KOps/s $\color{#35bf28}+0.10\%$
test_stacked_getleaf 0.7861ms 0.5522ms 1.8110 KOps/s 1.8384 KOps/s $\color{#d91a1a}-1.49\%$
test_stacked_get 0.5675ms 0.5198ms 1.9239 KOps/s 1.9630 KOps/s $\color{#d91a1a}-1.99\%$
test_nested_getitemleaf 71.1810μs 8.4061μs 118.9613 KOps/s 118.8258 KOps/s $\color{#35bf28}+0.11\%$
test_nested_getitem 32.5500μs 7.9580μs 125.6603 KOps/s 125.5521 KOps/s $\color{#35bf28}+0.09\%$
test_stacked_getitemleaf 0.7634ms 0.5498ms 1.8187 KOps/s 1.8589 KOps/s $\color{#d91a1a}-2.16\%$
test_stacked_getitem 0.5698ms 0.5196ms 1.9246 KOps/s 1.9418 KOps/s $\color{#d91a1a}-0.89\%$
test_lock_nested 4.4341ms 0.4532ms 2.2064 KOps/s 2.7243 KOps/s $\textbf{\color{#d91a1a}-19.01\%}$
test_lock_stack_nested 72.7933ms 6.6375ms 150.6583 Ops/s 194.2667 Ops/s $\textbf{\color{#d91a1a}-22.45\%}$
test_unlock_nested 1.2895ms 0.4275ms 2.3390 KOps/s 2.4913 KOps/s $\textbf{\color{#d91a1a}-6.11\%}$
test_unlock_stack_nested 69.2938ms 7.3690ms 135.7041 Ops/s 165.9123 Ops/s $\textbf{\color{#d91a1a}-18.21\%}$
test_flatten_speed 0.5239ms 0.1895ms 5.2768 KOps/s 5.2724 KOps/s $\color{#35bf28}+0.08\%$
test_unflatten_speed 0.4053ms 0.3722ms 2.6867 KOps/s 2.6929 KOps/s $\color{#d91a1a}-0.23\%$
test_common_ops 1.0638ms 0.6085ms 1.6433 KOps/s 1.6005 KOps/s $\color{#35bf28}+2.68\%$
test_creation 37.4700μs 1.9440μs 514.4116 KOps/s 516.3511 KOps/s $\color{#d91a1a}-0.38\%$
test_creation_empty 36.9500μs 6.9358μs 144.1788 KOps/s 142.9254 KOps/s $\color{#35bf28}+0.88\%$
test_creation_nested_1 24.8300μs 9.6697μs 103.4155 KOps/s 103.4122 KOps/s $+0.00\%$
test_creation_nested_2 32.0710μs 12.1948μs 82.0023 KOps/s 82.0757 KOps/s $\color{#d91a1a}-0.09\%$
test_clone 96.7820μs 14.1987μs 70.4291 KOps/s 67.3505 KOps/s $\color{#35bf28}+4.57\%$
test_getitem[int] 28.1100μs 12.1707μs 82.1644 KOps/s 79.0210 KOps/s $\color{#35bf28}+3.98\%$
test_getitem[slice_int] 43.1010μs 23.6418μs 42.2980 KOps/s 34.9642 KOps/s $\textbf{\color{#35bf28}+20.97\%}$
test_getitem[range] 65.6310μs 39.4150μs 25.3711 KOps/s 20.5263 KOps/s $\textbf{\color{#35bf28}+23.60\%}$
test_getitem[tuple] 38.8200μs 20.3664μs 49.1004 KOps/s 38.5962 KOps/s $\textbf{\color{#35bf28}+27.22\%}$
test_getitem[list] 0.2955ms 36.2402μs 27.5937 KOps/s 21.4926 KOps/s $\textbf{\color{#35bf28}+28.39\%}$
test_setitem_dim[int] 45.7710μs 26.0464μs 38.3930 KOps/s 36.0922 KOps/s $\textbf{\color{#35bf28}+6.37\%}$
test_setitem_dim[slice_int] 65.3620μs 46.0694μs 21.7064 KOps/s 21.1707 KOps/s $\color{#35bf28}+2.53\%$
test_setitem_dim[range] 94.9610μs 62.4814μs 16.0048 KOps/s 15.5581 KOps/s $\color{#35bf28}+2.87\%$
test_setitem_dim[tuple] 60.7710μs 39.2350μs 25.4874 KOps/s 24.0508 KOps/s $\textbf{\color{#35bf28}+5.97\%}$
test_setitem 0.1282ms 18.1054μs 55.2323 KOps/s 53.5904 KOps/s $\color{#35bf28}+3.06\%$
test_set 0.1182ms 17.7211μs 56.4301 KOps/s 55.5688 KOps/s $\color{#35bf28}+1.55\%$
test_set_shared 2.8048ms 0.1007ms 9.9308 KOps/s 9.6422 KOps/s $\color{#35bf28}+2.99\%$
test_update 0.1025ms 21.9617μs 45.5339 KOps/s 44.4343 KOps/s $\color{#35bf28}+2.47\%$
test_update_nested 0.1155ms 31.3244μs 31.9240 KOps/s 31.4468 KOps/s $\color{#35bf28}+1.52\%$
test_set_nested 0.1022ms 18.6411μs 53.6450 KOps/s 51.1353 KOps/s $\color{#35bf28}+4.91\%$
test_set_nested_new 0.1130ms 23.7239μs 42.1517 KOps/s 41.4113 KOps/s $\color{#35bf28}+1.79\%$
test_select 0.1193ms 46.2195μs 21.6359 KOps/s 21.7670 KOps/s $\color{#d91a1a}-0.60\%$
test_to 76.4420μs 52.9571μs 18.8832 KOps/s 18.3328 KOps/s $\color{#35bf28}+3.00\%$
test_to_nonblocking 74.0010μs 35.2038μs 28.4060 KOps/s 27.4869 KOps/s $\color{#35bf28}+3.34\%$
test_unbind_speed 0.4744ms 0.3447ms 2.9007 KOps/s 3.7381 KOps/s $\textbf{\color{#d91a1a}-22.40\%}$
test_unbind_speed_stack0 62.1445ms 5.1957ms 192.4684 Ops/s 267.0491 Ops/s $\textbf{\color{#d91a1a}-27.93\%}$
test_unbind_speed_stack1 3.3485μs 0.5218μs 1.9165 MOps/s 1.8705 MOps/s $\color{#35bf28}+2.46\%$
test_split 54.1639ms 1.7984ms 556.0405 Ops/s 348.1993 Ops/s $\textbf{\color{#35bf28}+59.69\%}$
test_chunk 54.5577ms 1.7831ms 560.8108 Ops/s 355.8527 Ops/s $\textbf{\color{#35bf28}+57.60\%}$
test_creation[device0] 0.3976ms 0.3096ms 3.2303 KOps/s 3.1875 KOps/s $\color{#35bf28}+1.34\%$
test_creation[device1] 0.4783ms 0.3121ms 3.2042 KOps/s 3.1664 KOps/s $\color{#35bf28}+1.20\%$
test_creation_from_tensor 0.5908ms 0.3395ms 2.9459 KOps/s 2.9425 KOps/s $\color{#35bf28}+0.12\%$
test_add_one[memmap_tensor0] 68.3110μs 24.1442μs 41.4178 KOps/s 38.7398 KOps/s $\textbf{\color{#35bf28}+6.91\%}$
test_add_one[memmap_tensor1] 0.2194ms 74.3071μs 13.4577 KOps/s 13.2923 KOps/s $\color{#35bf28}+1.24\%$
test_contiguous[memmap_tensor0] 21.6710μs 5.7946μs 172.5737 KOps/s 163.8283 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_contiguous[memmap_tensor1] 49.7910μs 22.3540μs 44.7347 KOps/s 43.6534 KOps/s $\color{#35bf28}+2.48\%$
test_stack[memmap_tensor0] 48.9200μs 20.0195μs 49.9512 KOps/s 48.1817 KOps/s $\color{#35bf28}+3.67\%$
test_stack[memmap_tensor1] 0.1541ms 73.0036μs 13.6980 KOps/s 13.1806 KOps/s $\color{#35bf28}+3.92\%$
test_memmaptd_index 0.2634ms 0.2250ms 4.4450 KOps/s 4.2562 KOps/s $\color{#35bf28}+4.43\%$
test_memmaptd_index_astensor 0.3208ms 0.2840ms 3.5216 KOps/s 3.1810 KOps/s $\textbf{\color{#35bf28}+10.71\%}$
test_memmaptd_index_op 0.6126ms 0.5541ms 1.8047 KOps/s 1.7074 KOps/s $\textbf{\color{#35bf28}+5.70\%}$
test_reshape_pytree 0.2549ms 20.9664μs 47.6954 KOps/s 46.1799 KOps/s $\color{#35bf28}+3.28\%$
test_reshape_td 64.1910μs 31.0007μs 32.2573 KOps/s 31.4152 KOps/s $\color{#35bf28}+2.68\%$
test_view_pytree 39.3410μs 20.6280μs 48.4778 KOps/s 46.8664 KOps/s $\color{#35bf28}+3.44\%$
test_view_td 25.0800μs 4.0748μs 245.4115 KOps/s 247.7869 KOps/s $\color{#d91a1a}-0.96\%$
test_unbind_pytree 53.2910μs 25.9424μs 38.5469 KOps/s 37.4126 KOps/s $\color{#35bf28}+3.03\%$
test_unbind_td 83.8410μs 55.9009μs 17.8888 KOps/s 24.5093 KOps/s $\textbf{\color{#d91a1a}-27.01\%}$
test_split_pytree 38.3110μs 23.3933μs 42.7472 KOps/s 41.1151 KOps/s $\color{#35bf28}+3.97\%$
test_split_td 67.4510μs 42.7063μs 23.4157 KOps/s 14.7408 KOps/s $\textbf{\color{#35bf28}+58.85\%}$
test_add_pytree 0.1029ms 32.1250μs 31.1284 KOps/s 30.2911 KOps/s $\color{#35bf28}+2.76\%$
test_add_td 67.9910μs 44.0228μs 22.7155 KOps/s 21.2645 KOps/s $\textbf{\color{#35bf28}+6.82\%}$
test_distributed 25.9300μs 5.4703μs 182.8051 KOps/s 182.3007 KOps/s $\color{#35bf28}+0.28\%$
test_tdmodule 1.7622ms 18.1932μs 54.9657 KOps/s 59.5932 KOps/s $\textbf{\color{#d91a1a}-7.77\%}$
test_tdmodule_dispatch 0.1855ms 32.9261μs 30.3710 KOps/s 30.4001 KOps/s $\color{#d91a1a}-0.10\%$
test_tdseq 35.5410μs 19.6712μs 50.8357 KOps/s 49.7514 KOps/s $\color{#35bf28}+2.18\%$
test_tdseq_dispatch 58.0210μs 35.6586μs 28.0437 KOps/s 27.9625 KOps/s $\color{#35bf28}+0.29\%$
test_instantiation_functorch 1.7459ms 1.6657ms 600.3397 Ops/s 588.4496 Ops/s $\color{#35bf28}+2.02\%$
test_instantiation_td 1.7740ms 1.1704ms 854.4074 Ops/s 849.6794 Ops/s $\color{#35bf28}+0.56\%$
test_exec_functorch 0.2249ms 0.1578ms 6.3377 KOps/s 6.1359 KOps/s $\color{#35bf28}+3.29\%$
test_exec_td 0.2252ms 0.1487ms 6.7228 KOps/s 6.5616 KOps/s $\color{#35bf28}+2.46\%$
test_vmap_mlp_speed[True-True] 1.1356ms 1.0854ms 921.3252 Ops/s 929.2716 Ops/s $\color{#d91a1a}-0.86\%$
test_vmap_mlp_speed[True-False] 0.6956ms 0.6258ms 1.5979 KOps/s 1.6048 KOps/s $\color{#d91a1a}-0.43\%$
test_vmap_mlp_speed[False-True] 1.0469ms 0.9969ms 1.0031 KOps/s 1.0090 KOps/s $\color{#d91a1a}-0.59\%$
test_vmap_mlp_speed[False-False] 0.7063ms 0.5589ms 1.7893 KOps/s 1.7710 KOps/s $\color{#35bf28}+1.03\%$
test_vmap_transformer_speed[True-True] 12.8072ms 12.6555ms 79.0167 Ops/s 77.8207 Ops/s $\color{#35bf28}+1.54\%$
test_vmap_transformer_speed[True-False] 8.4560ms 8.3770ms 119.3748 Ops/s 119.9304 Ops/s $\color{#d91a1a}-0.46\%$
test_vmap_transformer_speed[False-True] 12.7538ms 12.6316ms 79.1668 Ops/s 79.4065 Ops/s $\color{#d91a1a}-0.30\%$
test_vmap_transformer_speed[False-False] 8.3672ms 8.3070ms 120.3801 Ops/s 121.2295 Ops/s $\color{#d91a1a}-0.70\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 107. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 29.4450μs 16.4963μs 60.6198 KOps/s 60.2416 KOps/s $\color{#35bf28}+0.63\%$
test_plain_set_stack_nested 0.2066ms 0.1512ms 6.6128 KOps/s 6.7255 KOps/s $\color{#d91a1a}-1.68\%$
test_plain_set_nested_inplace 41.5780μs 19.4627μs 51.3803 KOps/s 51.8603 KOps/s $\color{#d91a1a}-0.93\%$
test_plain_set_stack_nested_inplace 0.3429ms 0.1773ms 5.6416 KOps/s 5.6978 KOps/s $\color{#d91a1a}-0.99\%$
test_items 20.9090μs 2.4024μs 416.2581 KOps/s 365.2251 KOps/s $\textbf{\color{#35bf28}+13.97\%}$
test_items_nested 0.5016ms 0.2715ms 3.6832 KOps/s 3.7454 KOps/s $\color{#d91a1a}-1.66\%$
test_items_nested_locked 1.0790ms 0.2719ms 3.6783 KOps/s 3.7553 KOps/s $\color{#d91a1a}-2.05\%$
test_items_nested_leaf 0.3192ms 0.1649ms 6.0651 KOps/s 6.0511 KOps/s $\color{#35bf28}+0.23\%$
test_items_stack_nested 1.7411ms 1.4366ms 696.0915 Ops/s 687.8216 Ops/s $\color{#35bf28}+1.20\%$
test_items_stack_nested_leaf 2.0609ms 1.3095ms 763.6699 Ops/s 755.1914 Ops/s $\color{#35bf28}+1.12\%$
test_items_stack_nested_locked 1.8480ms 0.7590ms 1.3175 KOps/s 1.3045 KOps/s $\color{#35bf28}+0.99\%$
test_keys 23.1530μs 3.7992μs 263.2109 KOps/s 260.8091 KOps/s $\color{#35bf28}+0.92\%$
test_keys_nested 0.5315ms 0.1408ms 7.1002 KOps/s 6.0885 KOps/s $\textbf{\color{#35bf28}+16.62\%}$
test_keys_nested_locked 0.2071ms 0.1407ms 7.1076 KOps/s 7.2719 KOps/s $\color{#d91a1a}-2.26\%$
test_keys_nested_leaf 0.3004ms 0.1406ms 7.1121 KOps/s 7.1853 KOps/s $\color{#d91a1a}-1.02\%$
test_keys_stack_nested 1.5541ms 1.3807ms 724.2662 Ops/s 735.5281 Ops/s $\color{#d91a1a}-1.53\%$
test_keys_stack_nested_leaf 1.5083ms 1.3747ms 727.4133 Ops/s 724.8835 Ops/s $\color{#35bf28}+0.35\%$
test_keys_stack_nested_locked 1.1801ms 0.6754ms 1.4807 KOps/s 1.4612 KOps/s $\color{#35bf28}+1.34\%$
test_values 5.1172μs 1.1384μs 878.3946 KOps/s 844.2885 KOps/s $\color{#35bf28}+4.04\%$
test_values_nested 87.2730μs 49.2701μs 20.2963 KOps/s 20.2073 KOps/s $\color{#35bf28}+0.44\%$
test_values_nested_locked 0.1080ms 49.0445μs 20.3896 KOps/s 20.2569 KOps/s $\color{#35bf28}+0.66\%$
test_values_nested_leaf 58.3890μs 43.7093μs 22.8784 KOps/s 22.6080 KOps/s $\color{#35bf28}+1.20\%$
test_values_stack_nested 1.9029ms 1.1573ms 864.0748 Ops/s 823.5532 Ops/s $\color{#35bf28}+4.92\%$
test_values_stack_nested_leaf 1.8139ms 1.1471ms 871.7794 Ops/s 855.2624 Ops/s $\color{#35bf28}+1.93\%$
test_values_stack_nested_locked 0.6453ms 0.5055ms 1.9784 KOps/s 1.9120 KOps/s $\color{#35bf28}+3.47\%$
test_membership 16.5710μs 1.3591μs 735.7675 KOps/s 669.5632 KOps/s $\textbf{\color{#35bf28}+9.89\%}$
test_membership_nested 20.8690μs 2.8008μs 357.0355 KOps/s 356.2264 KOps/s $\color{#35bf28}+0.23\%$
test_membership_nested_leaf 26.8300μs 2.7827μs 359.3693 KOps/s 358.9039 KOps/s $\color{#35bf28}+0.13\%$
test_membership_stacked_nested 27.4620μs 11.6500μs 85.8366 KOps/s 85.2097 KOps/s $\color{#35bf28}+0.74\%$
test_membership_stacked_nested_leaf 40.2750μs 11.7830μs 84.8683 KOps/s 85.1776 KOps/s $\color{#d91a1a}-0.36\%$
test_membership_nested_last 34.0710μs 5.9839μs 167.1147 KOps/s 169.8950 KOps/s $\color{#d91a1a}-1.64\%$
test_membership_nested_leaf_last 23.0020μs 5.9726μs 167.4320 KOps/s 169.6993 KOps/s $\color{#d91a1a}-1.34\%$
test_membership_stacked_nested_last 0.3506ms 0.1727ms 5.7913 KOps/s 5.9644 KOps/s $\color{#d91a1a}-2.90\%$
test_membership_stacked_nested_leaf_last 34.6150μs 13.8368μs 72.2710 KOps/s 73.2589 KOps/s $\color{#d91a1a}-1.35\%$
test_nested_getleaf 29.5050μs 10.8681μs 92.0125 KOps/s 93.4101 KOps/s $\color{#d91a1a}-1.50\%$
test_nested_get 37.0090μs 10.7388μs 93.1198 KOps/s 98.5246 KOps/s $\textbf{\color{#d91a1a}-5.49\%}$
test_stacked_getleaf 1.0345ms 0.6141ms 1.6284 KOps/s 1.6111 KOps/s $\color{#35bf28}+1.07\%$
test_stacked_get 1.2884ms 0.5860ms 1.7065 KOps/s 1.6950 KOps/s $\color{#35bf28}+0.68\%$
test_nested_getitemleaf 30.9680μs 10.6682μs 93.7365 KOps/s 94.4220 KOps/s $\color{#d91a1a}-0.73\%$
test_nested_getitem 41.0760μs 10.2695μs 97.3759 KOps/s 99.8283 KOps/s $\color{#d91a1a}-2.46\%$
test_stacked_getitemleaf 0.9649ms 0.6083ms 1.6438 KOps/s 1.6039 KOps/s $\color{#35bf28}+2.49\%$
test_stacked_getitem 0.6948ms 0.5790ms 1.7270 KOps/s 1.6907 KOps/s $\color{#35bf28}+2.15\%$
test_lock_nested 55.1172ms 0.5444ms 1.8368 KOps/s 2.6116 KOps/s $\textbf{\color{#d91a1a}-29.67\%}$
test_lock_stack_nested 69.2491ms 7.6829ms 130.1596 Ops/s 267.2835 Ops/s $\textbf{\color{#d91a1a}-51.30\%}$
test_unlock_nested 58.4081ms 0.5046ms 1.9817 KOps/s 2.5167 KOps/s $\textbf{\color{#d91a1a}-21.26\%}$
test_unlock_stack_nested 63.7122ms 7.3987ms 135.1587 Ops/s 169.2324 Ops/s $\textbf{\color{#d91a1a}-20.13\%}$
test_flatten_speed 0.5506ms 0.2720ms 3.6762 KOps/s 3.6989 KOps/s $\color{#d91a1a}-0.61\%$
test_unflatten_speed 0.5847ms 0.4810ms 2.0789 KOps/s 2.1018 KOps/s $\color{#d91a1a}-1.09\%$
test_common_ops 4.1066ms 0.7015ms 1.4256 KOps/s 1.4123 KOps/s $\color{#35bf28}+0.94\%$
test_creation 25.2170μs 2.4023μs 416.2719 KOps/s 418.7368 KOps/s $\color{#d91a1a}-0.59\%$
test_creation_empty 23.1930μs 8.5691μs 116.6985 KOps/s 109.9100 KOps/s $\textbf{\color{#35bf28}+6.18\%}$
test_creation_nested_1 37.0590μs 12.4023μs 80.6305 KOps/s 74.6273 KOps/s $\textbf{\color{#35bf28}+8.04\%}$
test_creation_nested_2 38.5120μs 15.6443μs 63.9210 KOps/s 61.0338 KOps/s $\color{#35bf28}+4.73\%$
test_clone 0.1057ms 13.2143μs 75.6757 KOps/s 73.5106 KOps/s $\color{#35bf28}+2.95\%$
test_getitem[int] 31.3490μs 12.4660μs 80.2181 KOps/s 75.7262 KOps/s $\textbf{\color{#35bf28}+5.93\%}$
test_getitem[slice_int] 58.9100μs 24.5722μs 40.6964 KOps/s 31.1846 KOps/s $\textbf{\color{#35bf28}+30.50\%}$
test_getitem[range] 94.2150μs 43.6757μs 22.8960 KOps/s 17.7976 KOps/s $\textbf{\color{#35bf28}+28.65\%}$
test_getitem[tuple] 45.9660μs 19.6171μs 50.9761 KOps/s 41.4888 KOps/s $\textbf{\color{#35bf28}+22.87\%}$
test_getitem[list] 0.2406ms 39.0110μs 25.6338 KOps/s 19.6797 KOps/s $\textbf{\color{#35bf28}+30.25\%}$
test_setitem_dim[int] 68.6180μs 27.9225μs 35.8134 KOps/s 34.6681 KOps/s $\color{#35bf28}+3.30\%$
test_setitem_dim[slice_int] 89.4670μs 51.3014μs 19.4926 KOps/s 18.5518 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_setitem_dim[range] 0.1113ms 72.9033μs 13.7168 KOps/s 13.6633 KOps/s $\color{#35bf28}+0.39\%$
test_setitem_dim[tuple] 82.8550μs 41.2980μs 24.2143 KOps/s 23.5141 KOps/s $\color{#35bf28}+2.98\%$
test_setitem 80.8810μs 18.5265μs 53.9768 KOps/s 51.0460 KOps/s $\textbf{\color{#35bf28}+5.74\%}$
test_set 86.2510μs 17.7731μs 56.2647 KOps/s 52.9183 KOps/s $\textbf{\color{#35bf28}+6.32\%}$
test_set_shared 1.8165ms 0.1384ms 7.2268 KOps/s 7.2596 KOps/s $\color{#d91a1a}-0.45\%$
test_update 90.3480μs 23.9673μs 41.7235 KOps/s 40.7637 KOps/s $\color{#35bf28}+2.35\%$
test_update_nested 88.1040μs 34.1754μs 29.2608 KOps/s 28.2487 KOps/s $\color{#35bf28}+3.58\%$
test_set_nested 85.8000μs 19.7926μs 50.5240 KOps/s 48.7807 KOps/s $\color{#35bf28}+3.57\%$
test_set_nested_new 0.1215ms 26.8921μs 37.1856 KOps/s 37.5265 KOps/s $\color{#d91a1a}-0.91\%$
test_select 0.1269ms 51.2621μs 19.5076 KOps/s 19.3639 KOps/s $\color{#35bf28}+0.74\%$
test_unbind_speed 0.6532ms 0.3708ms 2.6966 KOps/s 3.7486 KOps/s $\textbf{\color{#d91a1a}-28.06\%}$
test_unbind_speed_stack0 63.9701ms 5.3125ms 188.2340 Ops/s 255.1592 Ops/s $\textbf{\color{#d91a1a}-26.23\%}$
test_unbind_speed_stack1 1.8705μs 0.6338μs 1.5777 MOps/s 1.5348 MOps/s $\color{#35bf28}+2.80\%$
test_split 2.0059ms 1.6426ms 608.7770 Ops/s 324.1649 Ops/s $\textbf{\color{#35bf28}+87.80\%}$
test_chunk 56.4894ms 1.7453ms 572.9645 Ops/s 340.6940 Ops/s $\textbf{\color{#35bf28}+68.18\%}$
test_creation[device0] 0.3714ms 0.2896ms 3.4530 KOps/s 3.4066 KOps/s $\color{#35bf28}+1.36\%$
test_creation_from_tensor 3.1503ms 0.3259ms 3.0688 KOps/s 3.0103 KOps/s $\color{#35bf28}+1.94\%$
test_add_one[memmap_tensor0] 67.5360μs 25.3178μs 39.4978 KOps/s 38.7766 KOps/s $\color{#35bf28}+1.86\%$
test_contiguous[memmap_tensor0] 2.7530ms 5.8187μs 171.8603 KOps/s 176.8843 KOps/s $\color{#d91a1a}-2.84\%$
test_stack[memmap_tensor0] 94.9770μs 18.6182μs 53.7110 KOps/s 52.5598 KOps/s $\color{#35bf28}+2.19\%$
test_memmaptd_index 0.4052ms 0.1908ms 5.2416 KOps/s 5.3140 KOps/s $\color{#d91a1a}-1.36\%$
test_memmaptd_index_astensor 0.3947ms 0.2583ms 3.8715 KOps/s 3.9855 KOps/s $\color{#d91a1a}-2.86\%$
test_memmaptd_index_op 1.2280ms 0.4988ms 2.0047 KOps/s 2.0104 KOps/s $\color{#d91a1a}-0.28\%$
test_reshape_pytree 0.2863ms 23.1407μs 43.2139 KOps/s 42.6054 KOps/s $\color{#35bf28}+1.43\%$
test_reshape_td 72.0140μs 32.6757μs 30.6037 KOps/s 30.3858 KOps/s $\color{#35bf28}+0.72\%$
test_view_pytree 59.2710μs 22.9230μs 43.6244 KOps/s 42.9537 KOps/s $\color{#35bf28}+1.56\%$
test_view_td 21.5000μs 4.9208μs 203.2199 KOps/s 202.4707 KOps/s $\color{#35bf28}+0.37\%$
test_unbind_pytree 62.4160μs 26.0819μs 38.3408 KOps/s 38.2756 KOps/s $\color{#35bf28}+0.17\%$
test_unbind_td 0.1174ms 58.5955μs 17.0662 KOps/s 24.5175 KOps/s $\textbf{\color{#d91a1a}-30.39\%}$
test_split_pytree 67.9170μs 25.9602μs 38.5204 KOps/s 38.1016 KOps/s $\color{#35bf28}+1.10\%$
test_split_td 0.1084ms 45.4651μs 21.9949 KOps/s 13.3940 KOps/s $\textbf{\color{#35bf28}+64.21\%}$
test_add_pytree 87.7740μs 31.8174μs 31.4293 KOps/s 30.7939 KOps/s $\color{#35bf28}+2.06\%$
test_add_td 99.4960μs 45.6419μs 21.9097 KOps/s 21.2069 KOps/s $\color{#35bf28}+3.31\%$
test_distributed 26.7900μs 6.0052μs 166.5235 KOps/s 165.8579 KOps/s $\color{#35bf28}+0.40\%$
test_tdmodule 0.1009ms 21.8140μs 45.8420 KOps/s 42.4383 KOps/s $\textbf{\color{#35bf28}+8.02\%}$
test_tdmodule_dispatch 0.1776ms 40.5871μs 24.6384 KOps/s 25.0303 KOps/s $\color{#d91a1a}-1.57\%$
test_tdseq 0.3550ms 25.3614μs 39.4300 KOps/s 40.0340 KOps/s $\color{#d91a1a}-1.51\%$
test_tdseq_dispatch 0.4215ms 44.3986μs 22.5232 KOps/s 22.8199 KOps/s $\color{#d91a1a}-1.30\%$
test_instantiation_functorch 1.6955ms 1.2944ms 772.5562 Ops/s 772.6788 Ops/s $\color{#d91a1a}-0.02\%$
test_instantiation_td 1.7365ms 1.0075ms 992.5636 Ops/s 989.9567 Ops/s $\color{#35bf28}+0.26\%$
test_exec_functorch 0.2474ms 0.1467ms 6.8147 KOps/s 6.7463 KOps/s $\color{#35bf28}+1.01\%$
test_exec_td 0.2161ms 0.1405ms 7.1153 KOps/s 6.9240 KOps/s $\color{#35bf28}+2.76\%$
test_vmap_mlp_speed[True-True] 1.0563ms 0.8762ms 1.1414 KOps/s 1.1250 KOps/s $\color{#35bf28}+1.45\%$
test_vmap_mlp_speed[True-False] 0.7325ms 0.4692ms 2.1314 KOps/s 2.1245 KOps/s $\color{#35bf28}+0.32\%$
test_vmap_mlp_speed[False-True] 1.5075ms 0.7640ms 1.3088 KOps/s 1.2910 KOps/s $\color{#35bf28}+1.38\%$
test_vmap_mlp_speed[False-False] 0.6678ms 0.3870ms 2.5837 KOps/s 2.5871 KOps/s $\color{#d91a1a}-0.13\%$

@vmoens
Copy link
Contributor Author

vmoens commented Nov 21, 2023

Incidentally, we fix a bug of unbind which makes it slightly slower than it used to be...

@vmoens vmoens merged commit 3689afa into main Nov 21, 2023
@vmoens vmoens deleted the faster-unbind branch November 21, 2023 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants